Consider a two-sided z-test on a proportion \(p\) based on \(x\) successes in \(n\) trials. The null hypothesis is that \(p = p_0\) and the alternative hypothesis is that \(p != p_0\). The test says to reject the null hypothesis whenever the test statistic \(z = (\hat{p} - p_0) / \sqrt{(p_0 * (1 - p_0) / n)}\) is greater than or equal to 1.96 in absolute value (\(\hat{p} = x/n\)). This test is reported to have an \(\alpha = 0.05\) level of significance, but the actual probability of a Type I error (rejecting the null hypothesis when in fact it is true) may be different from \(\alpha\) because:

  1. The test statistic is only approximately normally distributed
  2. The actual distribution of the test statistic is discrete

Compute the actual probability of a Type I error for the procedure described for all combinations of \(p = [0.1, 0.3, 0.5]\) and \(n = [20, 30, ..., 100]\). Present the results in a 9 x 3 matrix, with the sample \(n\) going down the rows and the values of \(p\) across the columns.

Assess the Monte Carlo error with a 95% confidence interval for each combination of \(p\) and \(n\). For which combinations does the lower bound on the confidence interval exceed \(\alpha = 0.05\)?

rr simulate <- function(n, p, alpha = 0.05, n_reps = 100000) { x <- rbinom(n_reps, n, p) p_hat <- x / n z <- (p_hat - p) / sqrt(p * (1 - p)/n) # What proportion of times did we commit a Type I error at the given alpha level mean(abs(z) >= qnorm(1 - alpha/2)) }

simulate_s(30, .3)
[1] 0.07007
simulate(30, .3)
[1] 0.07187

rr microbenchmark::microbenchmark(simulate_s(30, .3), simulate(30, .3), times = 20)

Unit: milliseconds
est
      p=0.1 p=0.3 p=0.5
n=20     NA    NA    NA
n=30     NA    NA    NA
n=40     NA    NA    NA
n=50     NA    NA    NA
n=60     NA    NA    NA
n=70     NA    NA    NA
n=80     NA    NA    NA
n=90     NA    NA    NA
n=100    NA    NA    NA
est
        p=0.1   p=0.3   p=0.5
n=20  0.04264 0.02490 0.04208
n=30  0.02601 0.07108 0.04334
n=40  0.05663 0.05569 0.03798
n=50  0.02962 0.04242 0.06454
n=60  0.04759 0.06557 0.05335
n=70  0.06754 0.04959 0.04177
n=80  0.03650 0.03697 0.05688
n=90  0.05007 0.05018 0.04294
n=100 0.06202 0.06177 0.05954
lower > alpha | upper < alpha
      p=0.1 p=0.3 p=0.5
n=20   TRUE  TRUE  TRUE
n=30   TRUE  TRUE  TRUE
n=40   TRUE  TRUE  TRUE
n=50   TRUE  TRUE  TRUE
n=60   TRUE  TRUE  TRUE
n=70   TRUE FALSE  TRUE
n=80   TRUE  TRUE  TRUE
n=90  FALSE FALSE  TRUE
n=100  TRUE  TRUE  TRUE
LS0tCnRpdGxlOiAiWiBUZXN0IG9uIFByb3BvcnRpb25zIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgpgYGB7ciBzZXR1cCwgaW5jbHVkZSA9IEZBTFNFfQprbml0cjo6b3B0c19jaHVuayRzZXQoY2FjaGUgPSBUUlVFKQpgYGAKCkNvbnNpZGVyIGEgdHdvLXNpZGVkIHotdGVzdCBvbiBhIHByb3BvcnRpb24gJHAkIGJhc2VkIG9uICR4JCBzdWNjZXNzZXMgaW4gJG4kCnRyaWFscy4gVGhlIG51bGwgaHlwb3RoZXNpcyBpcyB0aGF0ICRwID0gcF8wJCBhbmQgdGhlIGFsdGVybmF0aXZlIGh5cG90aGVzaXMKaXMgdGhhdCAkcCAhPSBwXzAkLiBUaGUgdGVzdCBzYXlzIHRvIHJlamVjdCB0aGUgbnVsbCBoeXBvdGhlc2lzIHdoZW5ldmVyIHRoZSB0ZXN0CnN0YXRpc3RpYyAkeiA9IChcaGF0e3B9IC0gcF8wKSAvIFxzcXJ0eyhwXzAgKiAoMSAtIHBfMCkgLyBuKX0kIGlzIGdyZWF0ZXIgdGhhbgpvciBlcXVhbCB0byAxLjk2IGluIGFic29sdXRlIHZhbHVlICgkXGhhdHtwfSA9IHgvbiQpLiBUaGlzIHRlc3QgaXMgcmVwb3J0ZWQgdG8KaGF2ZSBhbiAkXGFscGhhID0gMC4wNSQgbGV2ZWwgb2Ygc2lnbmlmaWNhbmNlLCBidXQgdGhlIGFjdHVhbCBwcm9iYWJpbGl0eSBvZiBhIApUeXBlIEkgZXJyb3IgKHJlamVjdGluZyB0aGUgbnVsbCBoeXBvdGhlc2lzIHdoZW4gaW4gZmFjdCBpdCBpcyB0cnVlKSBtYXkgYmUgCmRpZmZlcmVudCBmcm9tICRcYWxwaGEkIGJlY2F1c2U6CgoxLiBUaGUgdGVzdCBzdGF0aXN0aWMgaXMgb25seSBhcHByb3hpbWF0ZWx5IG5vcm1hbGx5IGRpc3RyaWJ1dGVkCjIuIFRoZSBhY3R1YWwgZGlzdHJpYnV0aW9uIG9mIHRoZSB0ZXN0IHN0YXRpc3RpYyBpcyBkaXNjcmV0ZQoKQ29tcHV0ZSB0aGUgYWN0dWFsIHByb2JhYmlsaXR5IG9mIGEgVHlwZSBJIGVycm9yIGZvciB0aGUgcHJvY2VkdXJlIGRlc2NyaWJlZCBmb3IKYWxsIGNvbWJpbmF0aW9ucyBvZiAkcCA9IFswLjEsIDAuMywgMC41XSQgYW5kICRuID0gWzIwLCAzMCwgLi4uLCAxMDBdJC4gUHJlc2VudAp0aGUgcmVzdWx0cyBpbiBhIDkgeCAzIG1hdHJpeCwgd2l0aCB0aGUgc2FtcGxlICRuJCBnb2luZyBkb3duIHRoZSByb3dzIGFuZCB0aGUgCnZhbHVlcyBvZiAkcCQgYWNyb3NzIHRoZSBjb2x1bW5zLgoKQXNzZXNzIHRoZSBNb250ZSBDYXJsbyBlcnJvciB3aXRoIGEgOTUlIGNvbmZpZGVuY2UgaW50ZXJ2YWwgZm9yIGVhY2ggY29tYmluYXRpb24Kb2YgJHAkIGFuZCAkbiQuIEZvciB3aGljaCBjb21iaW5hdGlvbnMgZG9lcyB0aGUgbG93ZXIgYm91bmQgb24gdGhlIGNvbmZpZGVuY2UKaW50ZXJ2YWwgZXhjZWVkICRcYWxwaGEgPSAwLjA1JD8KCmBgYHtyfQpzaW11bGF0ZV9zIDwtIGZ1bmN0aW9uKG4sIHAsIGFscGhhID0gMC4wNSwgbl9yZXBzID0gMTAwMDAwKSB7CiAgeCA8LSByZXBsaWNhdGUobl9yZXBzLCBzdW0oc2FtcGxlKGMoMCwgMSksIG4sIHJlcGxhY2UgPSBUUlVFLCBwcm9iID0gYygxIC0gcCwgcCkpKSkKICBwX2hhdCA8LSB4IC8gbgogIHogPC0gKHBfaGF0IC0gcCkgLyBzcXJ0KHAgKiAoMSAtIHApL24pCiAgIyBXaGF0IHByb3BvcnRpb24gb2YgdGltZXMgZGlkIHdlIGNvbW1pdCBhIFR5cGUgSSBlcnJvciBhdCB0aGUgZ2l2ZW4gYWxwaGEgbGV2ZWwKICBtZWFuKGFicyh6KSA+PSBxbm9ybSgxIC0gYWxwaGEvMikpCn0KYGBgCgoKYGBge3J9CnNpbXVsYXRlIDwtIGZ1bmN0aW9uKG4sIHAsIGFscGhhID0gMC4wNSwgbl9yZXBzID0gMTAwMDAwKSB7CiAgeCA8LSByYmlub20obl9yZXBzLCBuLCBwKQogIHBfaGF0IDwtIHggLyBuCiAgeiA8LSAocF9oYXQgLSBwKSAvIHNxcnQocCAqICgxIC0gcCkvbikKICAjIFdoYXQgcHJvcG9ydGlvbiBvZiB0aW1lcyBkaWQgd2UgY29tbWl0IGEgVHlwZSBJIGVycm9yIGF0IHRoZSBnaXZlbiBhbHBoYSBsZXZlbAogIG1lYW4oYWJzKHopID49IHFub3JtKDEgLSBhbHBoYS8yKSkKfQpgYGAKCmBgYHtyfQpzaW11bGF0ZV9zKDMwLCAuMykKc2ltdWxhdGUoMzAsIC4zKQpgYGAKCmBgYHtyfQptaWNyb2JlbmNobWFyazo6bWljcm9iZW5jaG1hcmsoc2ltdWxhdGVfcygzMCwgLjMpLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgc2ltdWxhdGUoMzAsIC4zKSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHRpbWVzID0gMjApCmBgYAoKCmBgYHtyfQpwX3NlcSA8LSBjKDAuMSwgMC4zLCAwLjUpCm5fc2VxIDwtIHNlcSgyMCwgMTAwLCBieSA9IDEwKQoKZXN0IDwtIG1hdHJpeChOQSwgbnJvdyA9IGxlbmd0aChuX3NlcSksIG5jb2wgPSBsZW5ndGgocF9zZXEpKQpkaW1uYW1lcyhlc3QpIDwtIGxpc3Qoc3ByaW50Zigibj0lZCIsIG5fc2VxKSwKICAgICAgICAgICAgICAgICAgICAgIHNwcmludGYoInA9JS4xZiIsIHBfc2VxKSkKZXN0CmBgYAoKYGBge3J9Cm5fcmVwcyA8LSAxMDAwMDAKIyBJdGVyYXRlIG92ZXIgcm93cwpmb3IgKGkgaW4gMTpsZW5ndGgobl9zZXEpKSB7CiAgIyBJdGVyYXRlIG92ZXIgY29sdW1ucwogIGZvciAoaiBpbiAxOmxlbmd0aChwX3NlcSkpIHsKICAgIGVzdFtpLGpdIDwtIHNpbXVsYXRlKG5fc2VxW2ldLCBwX3NlcVtqXSwgbl9yZXBzID0gbl9yZXBzKQogIH0KfQoKZXN0CmBgYAoKCmBgYHtyIHRpZHksIG1lc3NhZ2U9RkFMU0V9CmxpYnJhcnkodGlkeXZlcnNlKQpyZXN1bHRzIDwtIGV4cGFuZC5ncmlkKHAgPSBwX3NlcSwgbiA9IG5fc2VxKSAlPiUgCiAgbXV0YXRlKHZhbHVlID0gbWFwMl9kYmwobiwgcCwgc2ltdWxhdGUsIG5fcmVwcyA9IG5fcmVwcykpICU+JSAKICBzcHJlYWQoa2V5ID0gcCwgdmFsdWUgPSB2YWx1ZSkKCnJlc3VsdHMKYGBgCgpgYGB7ciBjaX0KY2kgPC0gcW5vcm0oMC45NzUpICogc3FydChlc3QgKiAoMSAtIGVzdCkvbl9yZXBzKQpsb3dlciA8LSBlc3QgLSBjaQp1cHBlciA8LSBlc3QgKyBjaQoKYWxwaGEgPC0gMC4wNQp1cHBlcgpsb3dlciA+IGFscGhhIHwgdXBwZXIgPCBhbHBoYQpgYGAKCgo=